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Abstract 

Background: Genome-wide expression profiles are altered during biological aging and can describe molecular 
regulation of tissue degeneration. Age-regulated mRNA expression trends from cross-sectional studies could 
describe how aging progresses. We developed a novel statistical methodology to identify age-regulated expression 
trends in cross-sectional datasets. 

Results: We studied six cross-sectional RNA expression profiles from different human tissues. Our methodology, 
capable of overcoming technical and genetic background differences, identified an age-regulation in four of the 
tissues. For the identification of expression trends, five regression models were compared and the quadratic model 
was found as the most suitable for this study. After k-means clustering of the age-associated probes, expression 
trends were found to change at two major age-positions in brain cortex and in Vastus lateralis muscles. The first 
age-position was found to occur during the fifth decade and a later one during the eighth decade. In kidney cortex, 
however, only one age-position was identified correlating with a late age-position. Functional mapping of genes at 
each age-position suggests that calcium homeostasis and lipid metabolisms are initially affected and subsequently, 
in elderly mitochondria, apoptosis and hormonal signaling pathways are affected. 

Conclusions: Our results suggest that age-associated temporal changes in human tissues progress at distinct 
age-positions, which differ between tissues and in their molecular composition. 
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Background 

Biological aging represents an age-dependent decline of 
multiple physiological functions, tissue degenerations 
and molecular changes. In higher organisms, the aging 
process is determined by a combination of genetics and 
environmental factors, and therefore is considered as a 
complex process. The complexity of the biological aging 
is also contributed to by multiple molecular inputs [1]. 
Molecular regulation and regulators of aging have been 
identified in simple animal models. However, it is still 
unclear whether they have a similar contribution to hu- 
man aging [2]. In contrast to simple animals, the aging 
of humans is characterized by high variations. In order 
to understand this process, robust and unbiased statis- 
tical analyses should be developed. 
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To describe aging as a time- dependent process, changes 
in physiological, cellular or molecular parameters with 
chronological age should be elucidated. For an accurate 
description, these parameters must be quantitative and 
should be analyzed with appropriate statistical tests. 
Ideally, the rate of aging-associated changes would be ex- 
tracted from longitudinal datasets, where changes in one 
subject are followed over the course of time. In humans, 
however, longitudinal sampling is often impractical. In- 
stead, a cross-sectional dataset that covers a broad age- 
range is an alternative [2]. 

Cross-sectional genome-wide RNA datasets were gener- 
ated from healthy subjects. Genes contributing to tissue 
degeneration have been identified (reviewed in: [2,3]). In 
these studies, linear models or an age group analyses were 
applied to identify age-regulated genes [4-8]. Linear 
models are simple and easy to apply, but make many as- 
sumptions that are doubtful for complex processes, such 
as aging. When describing a time-dependent process with 
a linear model, we assume a constant change over the 
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entire time, which is highly unexpected in biological pro- 
cesses. To investigate when aging initiates and how it pro- 
gresses, additional models should be considered 

We developed a novel five-step methodology to study 
temporal changes in cross-sectional genome-wide RNA 
datasets. In this methodology, the expression trend of 
each probe was determined using a quadratic model and 
its P'Vdlue was calculated in order to filter the age- 
regulated probes. Subsequently, the age-regulated probes 
were clustered such that major age-dependent expres- 
sion trends could be identified (outlined in Figure 1). 
The methodology was developed using a new Vastus 
lateralis {VL) muscles microarray dataset and was evalu- 
ated and validated on post-mortem brain frontal cortex 
(brain cortex) [8], kidney cortex and medulla [6] 
(Table 1). Despite technical and genetic background dif- 
ferences between platforms, similar age-regulated trends 
were found in VL muscles and in brain cortex. In both 
datasets, the expression trends bend at two distinct age- 
positions, at midlife and in old age. In kidney cortex, 
however, trends bend only in old age. Based on these re- 
sults we suggest that during aging, significant changes in 
RNA expression progress through distinct time points, 
which differ molecularly. 

Results 

Identification of age-regulated datasets and age- 
regulated genes 

Several cross -sectional microarray studies in humans are 
publically available [3]. Those were generated on different 
platforms and present different genetic backgrounds and 
technical variations in RNA isolation and labeling. We 
analyzed RNA microarray datasets from VL muscles of 
healthy individuals [9] brain cortex [8], kidney cortex 
and medulla [6] (datasets are detailed in Additional file 1: 
Table SI). The datasets were separately analyzed using a 
similar procedure: normalization using Variance Sta- 
bilization and Normalization (VSN), log transformation 
and gender correction. A significant age-association (p- 
value < 0.05) was found in VL muscles, brain and kidney 
cortex but not in kidney medulla (Table 1). Importantly, 
genetic background differences and technical variations 
are not expected in the kidney cortex and medulla data- 
sets, since both are from the same platform and in most 
cases, tissues were collected from the same individuals [6]. 
Despite genetic background and technical differences be- 
tween VL muscles and brain cortex, in both datasets the 
age-association of genome-wide expression profiles is the 
most robust. This analysis indicates that reliable age- 
regulated probes would be found in VL muscles, brain 
and kidney cortex, but not in kidney medulla. 

We have used the datasets from VL muscles, brain cor- 
tex and kidney cortex to develop a methodology for age- 
associated RNA expression trends identification, while the 



kidney medulla dataset was used as a methodological 
control. This analysis indicates that in humans age- 
associated genome-wide expression profiles highly differ 
between tissues. 

Identification of expression trends in cross-sectional datasets 

In cross-sectional datasets, expression levels can vary be- 
tween individuals and could constrain the identification of 
significant age-associated expression trends. To overcome 
this limitation, expression levels were smoothed per probe 
prior to the selection of age-associated probes. The probe 
smoothing was carried out per dataset. We compared five 
different smoothing procedures: linear, quadratic and cubic 
regression models in order to identify the most adequate 
for cross-sectional gene expression datasets (Additional file 
1: Figure SIA). The genome-wide fitness of each regression 
model was statistically evaluated for all probes, per platform 
and the fitness was compared between each two models. 
This analysis indicates that the cubic and quadratic model 
have a similar fit, whereas the fitness of the linear model 
was significantly worse (Additional file 1: Figure SIB). The 
robustness of each model was assessed using the leave- 
one-out' cross-validation, indicating that between the four 
regression models, the linear model has the largest error 
rate (Additional file 1: Figure SIB). Together, these analyses 
indicate that a linear model is less suitable to describe age- 
associated changes in cross -sectional datasets, compared 
with quadratic or cubic models. It is important to note that 
quadratic and cubic regression models use fewer assump- 
tions than the linear model, therefore would be more suit- 
able for a dataset in which complex changes occur with 
time. Although both quadratic and cubic models showed 
similar fit, we carried our analysis using the quadratic re- 
gression model, as it is less likely to overfit the data, in 
comparison with the cubic model. 

The significance of age-regulated trends was deter- 
mined per probe, assuming no age-association as the 
null hypothesis. Only probes presenting a j^-value < 0.05 
(unadjusted) were kept for further analysis (Table 1 and 
Additional file 1: Figure S2). Each dataset passed two fil- 
tering steps, therefore ^-values were not corrected for 
multiple testing. In accordance with the globaltest p- 
value, the number of age-regulated probes in brain, VL 
muscles and kidney cortex was higher than in kidney 
medulla, which did not pass the globaltest /^-value 
threshold (Table 1). Only limited overlap (-10%) of age- 
regulated genes was found between each two-tissue com- 
bination and less than 2% overlap was found between 
three-tissue combinations (Additional file 1: Table S3). 
This suggests that molecular aging differs among tissues, 
therefore the analysis was carried out per tissue. 

To statistically assess the smoothing procedure and 
the significance of the age-regulated probes, the samples in 
each dataset were randomly permuted, generating artificial 
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Figure 1 (See legend on next page.) 
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(See figure on previous page.) 

Figure 1 Dominant expression trends in VL muscles, brain cortex and Icidney cortex. Plots sliow tine major trends identified from tlie 
significant age-regulated probes using /c-means with Euclidean distance as metric (A) or absolute correlation as metric (B). The X-axis represents 
the age in years and the Y-axis the fold change (log2) after normalization to the average. Shown are the major age-regulated trends, where red 
circles interconnected by red lines represent the cluster centroids, blue and black lines are the 95th or 99th percentiles, respectively. The number 
of probes in each cluster is indicated between brackets. Arrowheads indicate the age-positions. 



datasets. One hundred permuted datasets were generated 
per tissue, in order to reduce the impact of permutations 
that do not highly differ from the chronologically ordered 
dataset. The ratio between higher to lower numbers of sig- 
nificant probes in permuted datasets compared with the 
chronologically ordered dataset was considerably small in 
brain, VL muscles and kidney cortex (Table 1). In contrast, 
in the kidney medulla the ratio was much higher (031, 
Table 1), suggesting that the age-regulated probes in this 
dataset may represent noise. 

The number of probes that are found in the original 
data, but not in the average permuted dataset is an esti- 
mation of the amount of truly aging-associated genes 
[10], therefore from the dataset permutation analysis we 
can conclude that in the significantly age-regulated data- 
sets the ratio signal to noise is very small. 

This analysis demonstrates that the filtering step of the 
age-regulated probes from the chronologically ordered 
datasets can be performed with a significant confidence, 
even without correction for multiple testing. In addition, 
this suggests that in the datasets with no significant age- 
association, the fraction of false-positive age-associated 
probes would be high. Moreover, our procedure for age- 
associated probe identification is not highly affected by 
genetic background and platform differences. 

Identification of major age-associated expression trends 
and bend-positions 

To identify trends that are significantly associated with 
chronological age, the filtered probes were clustered 
using /c-means clustering, with Euclidean distance as 
metric. We identified four stable clusters exhibiting 
unique trends (Figure lA and Additional file 1: Figure 
S3B). Interestingly, reciprocal trends bend at similar age 



(Figure lA), suggesting that major expression changes are 
at defined time points. To extrapolate the age at which 
major expression changes occur, we applied /c-means clus- 
tering with absolute correlation as metric, which clusters 
reciprocal trends together. Two major trends were iden- 
tified (Figure IB). Each absolute correlation cluster rep- 
resents two reciprocal trends and the time point at 
which the reciprocal trends intersect within the cluster, 
represents the age where trends bend (Figure IB). In 
addition, this point is also defined as the smallest dis- 
tance between the 95th percentile and the cluster cen- 
troids (Figure IB). We named this point an age- 
position. In VL muscles, two age-positions were identi- 
fied: at age 43 ± 3 and 75 ± 5 (Table 2A). The range of 
each age-position was estimated from 2, 3 and 4/c- 
means absolute correlation clusters. Moreover, in brain 
cortex two distinct age-positions at 53 ± 3 and 77 ± 
3 years were identified (Table 2 A and Figure IB). In 
contrast, in kidney cortex the two age-positions were in- 
distinguishable by age (Table 2A). This analysis suggests 
that age-positions spatially and temporally differ be- 
tween tissues. 

To assess this aspect, we generated from the brain cor- 
tex dataset (26-106 years) three datasets with varied age 
ranges: 36-106, 45-106 and 26-87. The analysis was re- 
peated on these variant datasets starting from the probe 
smoothing step. The occurrence of the first age-position 
in the datasets 'trimmed' from the left (youngest sam- 
ples) was shifted around three years to the right, for each 
10 years of reduction in the dataset age range. Moreover, 
fewer significant probes were associated with the first 
age-position (Figure 2), whilst the second age-position 
was less influenced with respect to the occurrence age 
and the number of probes associated with (Figure 2). In 



Table 1 Description of the datasets used in this study and the statistical evaluation of age-associated probes 



Tissue 



Ciironological datasets 



Permuted datasets (N=100) 



Nr. samples Age range GT p-value ^ 



Age-regulated 



Age-regulated mean ± SE* 



[Ps>Cs/Ps<Cs]A 



VL muscles 


29 


35-89 


0.015 


4101 


2991 ± 86.1 


7/93 [0.07] 


Brain cortex 


30 


26-106 


0.004 


2339 


677 ± 52.2 


3/97 [0.03] 


Kidney cortex 


72 


27-92 


0.002 


2397 


1136 ± 77.2 


11/89 [0.12] 


Kidney medulla 


61 


29-92 


0.205 


1474 


1090 ± 70.0 


24/76 [0.31] 



Table shows the four datasets used in this study in chronological order and permuted samples. The number of samples and the age range (in years) are Indicated. 
* The p-value determined with the globaltest (GT). 

The number of age-regulated probes (p-value < 0.05) was determined using a quadratic regression model and no age-association as null hypothesis. In the 
permuted datasets, the mean and the standard error (SE) from 100 permutations per dataset were calculated. 

^The ratio between permuted samples (Ps) with a higher to lower number of significant probes, compared with chronologically ordered samples (Cs). 
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Table 2 Probes associated with an age-position and the gene overlap in age-positions 


A 


Dataset details 


1st age-position 


2nd age-position 


Tissue 


Nr samples Age-range 


Age 


# Probes 
up 1 down 


Age 


# Probes 
up 1 down 


VL muscles 


29 35-89 


43±3 


2716 
1241 1 1475 


75+5 


1385 
499 1 886 


Brain cortex 


30 26-106 


53±3 


1149 
638 1 511 


77±3 


1190 
392 1 798 


Kidney cortex 


72 27-92 


65±5 


1729 
392 1 854 


70±5 


668 
335 1 333 


B 


1st age-position 






2nd age-position 






VL muscles 


Brain cortex 


VL muscles 




Brain cortex 


VL muscles 


1976 (100%) 


95 (9.4%) 


1094 (100%) 




86 (6.4%) 


Brain cortex 


95 (4.8%) 


1010 (100%) 


68 (6.2%) 




1057 (100%) 


A. Table summarizes the age (in years) of each age-position and the number of probes, as well as the direction of regulation (up or down) in VL muscles, brain 
cortex and kidney cortex. Age-positions ± variations in years are indicated. 

B. Table shows the gene overlap between VL muscles and brain cortex in each age-position. Gene overlap was determined with Entrez ID. The percentage of over- 



lap is indicated between brackets. 



the 45-106 years dataset the first age-position was not 
stable due to the low sample resolution, but the second 
age-position was unchanged (Figure 2). To assess the ef- 
fect of centenarians, another artificial dataset was gener- 
ated by trimming the brain cortex dataset from the right, 
discarding the elderly (>87 years). Also in this dataset, two 
distinct age-positions were identified, but compared with 
the original dataset, the second age-position was more af- 
fected with respect to the occurrence age and the number 
of significant probes (Figure 2). Together this demon- 
strates that in the brain cortex dataset, two distinct age- 
positions where expression trends bend are stable and 
consistent, but the exact point is subject to variation ac- 
cording to the age range of the dataset. 

Next, we examined the effect of the dataset resolution 
on the occurrence of the age-positions. Since the kidney 
cortex dataset contained the largest number of subjects, 
we compared the full dataset (N = 72) with a generated 
dataset that includes only half of the subjects (N = 36; 
Additional file 1: Figure S4). To avoid a change in the 
distribution, the artificial dataset was generated by re- 
moving every other sample, maintaining their distribu- 
tion across decades (Additional file 1: Table S2). Age- 
positions did not differ between the original kidney cor- 
tex and 'half of the dataset (Additional file 1: Figure S4). 
This indicates that the resolution in an evenly age dis- 
tributed dataset has little impact on the occurrence of 
the age-positions. 

The stability of the age-positions and age-associated 
trends was also evaluated in the kidney medulla dataset. 
RNA expression profiles in this dataset did not present a 
significant age-association (Table 1). The trends generated 
by the 1474 probes passing the j^-value filtering, had a 
lower fold-change compared with the age-regulated 



datasets. Moreover, the age-positions were not clear or 
consistent (Figure 3A). As 1474 probes passed the filter 
for significant age-association, but the dataset itself did 
not pass under the globaltest threshold, it is highly pos- 
sible that a considerable amount of these probes repre- 
sent false-positives. 

To further investigate the effect of false positive probes, 
the age-positions were assessed in the permuted data- 
sets from brain cortex (Figure 3B) and kidney cortex 
(Additional file 1: Figure S5), where samples were not 
ordered according to chronological age. In these per- 
muted datasets, age-positions were not clear and not 
consistent. Together this demonstrates that stable and 
well defined age-positions can be found in chronologic- 
ally ordered, age-associated datasets, but not in datasets 
with insignificant age-regulation. 

To YQvibf and validate the specificity of the age-positions 
and to point the limitations of the procedure, this metho- 
dology was applied on several independent and modified 
datasets. For validations of the age-positions in VL 
muscles we used an independent VL muscles dataset, 
where the age range is wider (17-89 years) but the num- 
ber of samples is slightly smaller (N = 25) compared 
with the original VL muscles dataset (N = 29). Also in 
the validation dataset, two distinct age-positions were 
identified and these were close to the age-positions from 
the original dataset (Figure 4 and Additional file 1: 
Figure S6). As expected, the lower sample resolution in 
the validation dataset induces higher variation compared 
with the original dataset (Additional file 1: Figure S6). This 
suggests that age-positions do not result from technical 
or methodological artifacts, but represent a biological 
phenomenon. We noticed that in the datasets with a 
wider age range, the age-positions shifted about 3 years 
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Age-range 1st age-position 



2nd age-position 



26-1 06 Cluster #1 (1 1 49/2339) Cluster #2 (11 90/2339) 
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Figure 2 The effect of the dataset age-range on age-positions. Plots show the expression trends of the significant probes clustered using 
/(-means with absolute correlation as metric in the brain cortex dataset. Trends were generated for the age-ranges: 26-106 (original); 36-106; 
45-106 and 26-87. Red circles interconnected by red lines represent the cluster centroids, blue and black lines are the 95th or 99th percentiles, 
respectively. The number of probes in each cluster is indicated between brackets. Arrowheads denote the age-positions, indicating a bend in the 
expression trend. 



earlier compared with the original dataset (Figure 4). This 
suggests that the age range of the dataset could have an 
impact on the point at which the age-positions occur. 

Molecular composition of early and late age-positions 
is distinct 

For molecular characterization of the two distinct age- 
positions, a comparison between the age-positions in VL 
muscles and brain cortex was made. The overlap of genes 
(Entrez ID) between the tissues at each age-position was 



limited (5-9%; Table 2B). However, a similarity was found 
in the fold-change direction: in both tissues the propor- 
tion of up-regulated and down-regulated probes was simi- 
lar in the 1st age-position, but the down- regulated probes 
were enriched in the 2nd age-position (Table 2A). This 
suggests that different molecular processes regulate tran- 
scriptional changes at each age-position and those differ 
between tissues. 

To assess the function of genes. Gene Ontology (GO) 
enrichment of the significant genes at each age-position 
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Figure 3 Age-positions are not found in datasets without a significant age-regulation. Plots show /c-means absolute correlation clusters 
from kidney medulla dataset (A), or from brain cortex (B). The red circles connected by red lines represent the cluster centroids. The 95th 
percentile depicted in blue represents the major trends in the cluster. The 99th percentile shows the within cluster variation. The number of 
probes per cluster, as opposed to the total number of identified significant probes is denoted in the title of each plot. 



was investigated using unique Entrez IDs (genes are 
listed in Additional file 2: Tables S4-7). Significant en- 
richment of GO terms was determined using Fishers 
exact test in a genome-wide context {p < 0.05) and sub- 
sequently, using hierarchical clustering (a complete list 
of the hierarchical clusters per tissue per age-position is 
found in Additional file 3: Tables S8-11). In agreement 
with the origin of the tissues, the muscle contraction 
and the nerve cell system were the most prominently 
enriched biological processes in both age-positions in VL 
muscles and in brain cortex, respectively (Table 3). The 
vast majority of significantly enriched GO terms differ be- 
tween the two age-positions and between tissues. How- 
ever, calcium transport and lipid metabolism were affected 
in the 1st age-position in both tissues, and in the 2nd age- 
position apoptosis and mitochondria (Table 3). 

Discussion and conclusions 

Declines with age in tissue and cell functionality 
characterize biological aging. In cross-sectional datasets ob- 
tained from healthy humans, a decline in muscle strength 
starts only after midlife. Using two linear models the de- 
cline is significant from the 6th decade and then after it 
linearly progresses [11]. More recently, we identified a 
decline in RNA expression of PolyA RNA binding pro- 
tein 1 (PABPNl) in VL muscles from the fifth decade 
[9]. In agreement with these studies, we found that in 
VL skeletal muscles a major change in expression trends 
occurs during early midlife. In addition, we identified a 
second age, around 70 years, where expression trends 
changed. Coinciding with the 2nd age-position, the 



muscle waste also known as sarcopenia, is common in 
elderly [12]. 

Two distinct age-positions were also found in the brain 
frontal cortex. Our analysis demonstrated that the age- 
positions are consistently found in age-regulated datasets, 
as compared with the non-age-regulated datasets. The 
exact age at which trends change is significantly affected 
by the age-range of the dataset. A shift of ~3 years is to 
be considered based on the age range of the dataset 
(Figure 4B). In contrast, in the kidney cortex only one 
age-position was identified. This suggests that in humans, 
tissue aging differs by temporal and spatial means. As a 
support, very little gene overlap was found between tis- 
sues. Moreover, on the functional groups level, only two 
groups were similar in brain cortex and VL muscles. This 
suggests that aging in humans is not a linear process, but 
progresses through at least two age-positions in skeletal 
muscle and brain tissue. Analysis of the brain cortex 
dataset suggested qualitative changes around age 40 
and around age 70. Our findings are consistent with 
[8], but provide a quantitative description of temporal 
changes during aging. We applied observant tests and 
in-depth analyses, which allow us to conclude that the 
age-positions identified with our methodology are 
stable and are not platform-dependent. Albeit only 
limited similarity was found between the four tissues 
presented in this study, as a proof of concept, conclu- 
sions regarding the differences between tissues during 
aging should be made with a larger comparative study. 
In addition, during aging, cell composition within a tis- 
sue can also change. This study, however does not 
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Figure 4 Summary of the validation and stability of the age-positions. Plots show temporal changes in age-positions in the validation dataset 
for VL muscles (A) and in brain cortex (B). The age-position is denoted with the number of associated probes. Original datasets are denoted 
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range (depicted with a line) and the number of samples per dataset is denoted under the X-axis. Absolute correlation /c-means clustering plots 
for each dataset are shown in the Additional file 1. 



address possible age-regulated changes in cell compo- 
sition within a tissue. 

Based on the datasets used here, the 1st age-position 
in skeletal muscles is around 40 years of age whereas in 
brain it occurs around 50 years of age. Since the sample 
resolution and age-range in both datasets is comparable, 
it suggests that expression changes in skeletal muscles 
occur earlier compared with the brain cortex. Although 
additional studies are required to confirm this observa- 
tion, it may have an impact on approaches to improve 



healthy aging. The 2nd age-position in both datasets is 
found around 70 years of age and it is characterized by 
an enrichment of down-regulated genes, that cluster into 
well known aging- regulated processes [3] such as: apop- 
tosis, mitochondria, hormonal signaling pathways and 
could mark tissue degeneration. The 1st age-position in 
both tissues is enriched with changes in calcium homeo- 
stasis and lipid metabolism. Calcium homeostasis is es- 
sential for the maintenance of muscle and nerve cells, 
and physiological implications have been demonstrated 
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Table 3 A summary of GO terms that are significantly enriched in 1st (early) or 2nd (late) age-positions in VL muscles 
or brain cortex 





1st age-position 


2nd age-position 


VL muscles 


Muscle contraction; 


Contractile fiber; 




Calcium transport; 


Mitochondria; 




Lipid metabolism 


Apoptosis; 

Estrogen receptor signaling; 
Chromatin silencing; 
Phagocytic vesicle; 
RNA processing 


Brain cortex 


Nervous system; 


Nerve cell system; 




Calcium transport; 


Apoptosis; 




Lipid metabolism; 


Mitochondria; 




Actin cytoskeleton; 


Nucleotide biosynthesis; 




Interphase; 


Microtubule cytoskeleton; 




Apoptosis; 


Cell migration; 




Cell migration; 


Ion homeostasis; 




Nucleotide biosynthesis; 


Insulin-like growth factor signaling; 




Energy biogenesis; 


Hormone secretion 




Androgen signaling; 






Protein tyrosine kinase; 






Endocytic vesicle 





A list of significantly enriched GO terms in the early and late age-positions for VL muscles and brain cortex datasets. GO terms are ordered according to their sig- 
nificant enrichment. In bold are similar GO terms in both tissues. A complete list of the GO terms hierarchical trees can be found in Additional file 3: Tables S8-1 1. 



[13,14]. The molecular regulation of calcium homeosta- 
sis to cellular aging is not fully understood and requires 
additional investigation. The impact of lipid metabolisms 
during aging was studied in more detail, as adiposity in- 
creases in a number of tissues [15]. Recent studies in 
centenarian cohorts identified regulators of lipid metab- 
olism as candidates for longevity in humans [16,17]. 
From the list of genes (Additional file 2: Tables S4-7), in 
each age-position additional regulators of aging in humans 
could be identified for functional studies. 

A genome-wide decline in gene expression in the 2nd 
age-position could contribute to tissue degeneration in 
elderly. To assess the impact of the age-positions on 
longevity, we have analyzed a 'trimmed' dataset, exclud- 
ing samples above 87 years of age. Although two age- 
positions were found in this analysis, we argue that we 
cannot address the impact of the late age-position to 
longevity. Due to the nature of cross-sectional studies, 
like those used here, the late age-position could also 
present survivor effects of those individuals. For in- 
stance, it may be possible to observe an expression 
change in the population above age 90, as they become 
part of a more exclusive population. Changes in expres- 
sion may occur either due to level increases in every indi- 
vidual at 90 years of age, or due to the individuals with 
low expression that have small probability of survival until 



this age and above. In other words, it is also possible 
that cross-sectional datasets reveal the enrichment in 
survivor populations. Only longitudinal studies follow- 
ing the same subjects in time will be able to make a dis- 
tinction between these two possibilities and to reveal an 
impact on longevity. 

In statistical modeling there are two competing issues at 
stake. The first is avoiding bias that arises from the model 
assumptions that may not be true. The second is avoiding 
variance that arises from a model that is too flexible, thus 
the model would overfit the data. Previous studies in aging 
microarray datasets applied linear models in cross-sectional 
datasets [6,7,18]. Linear models, however, make many 
assumptions, which are doubtful in complex biological 
processes. It is highly unlikely that trends in biology are 
exactly linear. Here we have carefully compared linear, 
quadratic and cubic models on three different chrono- 
logically ordered datasets and found that the quadratic 
and cubic models outperformed the linear model. Al- 
though we chose to apply a quadratic regression model 
in our studies here, we recognize that more complex 
models could be investigated in future studies. The 
datasets used here are not dense enough for such 
models, thus the use of a dataset with multiple sam- 
ples per year and more uniformly distributed samples 
across age is advised. 
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Methods 

The inclusion criteria for this study are: even data distri- 
bution in every decade (at minimum starting from 
40 years) with a minimum of 25 samples (Additional file 
1: Table SI). In order to minimize the loss of information, 
analyses were carried out on probe ID, covering from 
48826 to 12567 unique probes per dataset, depending on 
the platform used. Subsequently, an age -regulation of 
mRNA profiles in each of the datasets was determined 
using the globaltest [19] where the /^-value calculation as- 
sumes no-age association as the null hypothesis. Gender 
correction was also included into the globaltest model. 

Microarray datasets 

The dataset from VL muscles was generated from healthy 
humans not presenting a chronic disease. All muscle 
biopsies were collected using the Bergstrom needle at 
Canisius-Wilhelmina Hospital, Nijmegen, The Netherlands, 
after an approval of the medical ethical committee 
Arnhem-Nijmegen (CMO nr. 2005/189) and a written 
informed consent from participants, as described in 
[20]. Biopsies were immediately frozen in liquid nitro- 
gen and stored at -80°C before RNA extraction. RNA 
extraction, labeling and hybridization on Illumina Hu- 
man v3.1 arrays were performed as described in [20]. 
Our muscle dataset is deposited for public use (GEO- 
GSE40645). The microarray datasets in this study was 
retrieved from brain frontal cortex: (GEO-GSE1572) 
[8], kidney cortex and kidney medulla, obtained from 
Stanford Microarray Dataset (http://cmgm.stanford.edu/ 
~kimlab/aging_kidney/). In the Rectus abdominis dataset, 
19 samples from other muscles {Deltoid, Arm muscle, Bi- 
ceps, Quadriceps, Fascia, Thigh muscle. Gastrocnemius, 
upper arm muscle) were excluded from this analysis. A 
second dataset from VL muscles (referred here as B) can 
be found at (GSE26605) [21]. Platforms of all datasets are 
detailed in Additional file 1: Table SI. 

Pre-processing 

All datasets were pre-processed with a uniform protocol 
including: variance stabilization and normalization {vsn 
V3.20.0 [21]), limma v3.8.3 [22], lumi v2.4.0 [23], affy 
vl.30 [24], and affy data vl.13.11 packages in R), as well 
as batch correction {ComBat R script available at http:// 
www.bu.edu/jlab/wp-assets/ComBat/Abstract.html) for 
the VL muscles dataset. The principal component ana- 
lysis (PGA), implemented in the princomp function in R 
package was applied in order to identify potential out- 
liers present in the dataset. Eventually, all samples were 
included in the study. In the kidney datasets, duplicates 
as well as individuals without gender assignments were 
excluded (Additional file 1: Table SI). A significant age- 
association ip'Vshie < 0.05) for each dataset was determined 



after gender correction using the globaltest (globaltest 
V5.10.0 package in R) [25]. 

Probe smoothing 

To identify age-related trends, smoothing was carried 
out per probe for each of the available tissues. Several 
smoothing models were considered. Spline functions 
[26] (including: cubic spline, Bezier curves, quadratic B- 
spline) and a local regression method (LOESS) [26] were 
compared with a simple linear regression (examples are 
shown in Additional file 1: Figure SI A). The perform- 
ance of the smoothing methods was statistically evalu- 
ated using leave-one-out' cross-validation (LOOGV) 
with a /c-fold equal to the dataset resolution. The fitted 
values, as well as the residuals were computed for all 
probes in the VL muscles dataset and the average sum of 
squares error represents the fitness measure of each 
smoothing method (Additional file 1: Figure SIB). A de- 
tailed presentation of each model is found in the descrip- 
tion of Additional file 1: Figure SI. Probe smoothing was 
separately applied on each dataset. An overlap of the age- 
regulated genes between tissues was assessed using unique 
Entrez IDs (Additional file 1: Table S3). 

The quadratic curve was chosen over the higher order 
and more complex spline models, as these showed a ten- 
dency to overfit the data due to the larger number of 
parameters in the model. The LOESS function was dis- 
carded due to the visible influence of a "wiggle" effect, 
as the LOESS allows more curvature in denser age 
neighborhoods. An example of this effect is shown in 
Additional file 1: Figure SI A. The absence of control 
points for the fitting curve of the quadratic B-spline 
function is adequate for an unbiased analysis and the 
use of a quadratic model allows more freedom in ex- 
pression trend identification. 

Probe filtering 

Age-regulated probes were filtered based on our quad- 
ratic model, testing the null hypothesis of no age asso- 
ciation versus the quadratic age model described above. 
Only the probes resulting in a p-v3.\ue < 0.05 (un- 
adjusted) were kept for further analysis. Examples of 
significant and non-significant probes are shown in 
(Additional file 1: Figure S2). 

Clustering 

Prior to clustering, the smoothened values of the filtered 
probes per dataset were normalized by subtracting the 
average expression value of a probe from each of its data 
points. This allows clustering based on trend similarity 
rather than expression values. The clustering was per- 
formed using /c-means with Euclidean distance as metric, 
aiming at minimizing the sum of squares error between the 
data points of a probe and the designated cluster centroids. 
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The cluster centroids were randomly initialized by the in- 
ternal mechanisms of the clustering functions, in order to 
respect the unbiased nature of the study. Their final pos- 
ition is obtained once the algorithm converges, based on 
the procedure in [27]. The resulting clusters were evaluated 
for confidence with the 99th and 95th percentile. The 95th 
percentile shows the borders of the dominant trends in a 
cluster and the variations within each trend were evaluated 
by point-wise reference intervals at the 95th and 99th per- 
centile (e.g.. Figure 1, Additional file 1: Figure S3B). The 
stability of the clusters was evaluated by cycles of re- 
clustering from 24, 16, 12, 8, 6 and 4 clusters until obtain- 
ing a minimum number of consistent clusters defining the 
major expression trends. Cluster stability was assessed with 
the sum of squares error between the centroids of a cluster 
and its associated probes. A genetic version of the /c-means 
algorithm was used [28] to evaluate the robustness of the 
classic /c-means algorithm convergence (Additional file 1: 
Figure S3 A). Using the classic /c-means algorithm, im- 
plemented in the R base package cluster, for N = 4 clus- 
ters, 70% of the runs converged exactly to the same 
point. This provides a rough estimation for the mini- 
mum and most stable number of clusters in this ana- 
lysis. A horizontal symmetry of the trends was evaluated 
with absolute correlation as a distance metric in the 
clustering algorithm, using the Kmeans function in the 
R package amap. Age-positions were estimated from 
the intersection of symmetric trends within an absolute 
correlation cluster, where the 95th percentile is closer to 
the centroids. The age range of an age-position was de- 
termined from the /c-means absolute correlation cluster- 
ing in 2, 3 and 4 clusters. 

Functional analysis was carried out with Entrez IDs. 
Probes were annotated per platform and were mapped 
to their corresponding Entrez ID, using the R packages 
lluminaHumanvSBeadID for VL muscles, hgu9Sav2,db 
for the brain frontal cortex and hgul33a.db for the kidney 
dataset. A small percentage of the data was lost, as not all 
probes are annotated and not all annotated probes have 
an Entrez ID (Additional file 1: Table SI). The unique 
genes forming the absolute correlation clusters were 
mapped cluster-wise onto the Gene Ontology (GO) data- 
base using the org.HS.eg.db database from Bioconductor. 
The significant GO terms were selected using Fishers 
exact test in R (^-value < 0.05). The GO term enrichment 
was calculated using as background the entire mapping to 
Entrez IDs per platform. In the brain cortex, the GO term 
/^-values were adjusted for multiple testing using the false 
discovery rate procedure [29]. Subsequently, redundant 
GO terms were removed. Generic (>1000 genes) as well 
as to specific (<10 genes) terms were discarded before the 
following step. The significant GO terms were grouped 
into hierarchical functional clusters using the GO data- 
base structure, which allows the search for interrelated 



GO terms and, moreover, provides information about 
their degree of relation. 

Availability of supporting data 

Our microarray data from VL muscles is deposited as 
(GEO-GSE40645) and publically available. Genes from 
first and second age-positions are available in text format 
in Additional file 2: Tables S4-7 and hierarchical trees of 
the significant GO terms from first and second age- 
positions are found in Additional file 3: Tables S8-11. 

Additional files 



Additional file 1: Figure SI. A comparison of data smoothing 
methods. Figure 52. An example of expression trends in a significant or 
an insignificant probe. Figure S3. K-means clustering using Euclidean 
distance applied to the significant probes of the Vastus lateralis dataset. 
Figure S4. The effect of the dataset size and resolution on age-positions. 
Figure S5. Age-positions are not consistent in permuted datasets. Figure S6. 
Absolute correlation /(-means clustering applied on the significant probes in 
Vastus lateralis. Table SI. A summary of the platform details of the datasets 
that were used in our study. Table S2. Age distribution per decade in four 
datasets that were used for trend analysis. Table S3. Overlap of 
age-associated Entrez ID between tissues. 

Additional file 2: Tables S4-7. Gene lists of Entrez ID and their gene 
symbols, for genes that are associated with the early (Tables S4 and 55) 
or late (Tables S6 and S7) age position in Brain cortex (Tables S4 and 

S6) or VL muscles (Tables S5 and S7), as well as lists of the overlapping 
genes per age-position. 

Additional file 3: Tables S8-11. Hierarchical clustering per tissue and 
per age-position of the significant GO terms. 
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